Performance Evaluation of Tiling for the Register Level

نویسندگان

Marta Jiménez

José María Llabería

Agustin Fernández

چکیده

Tiling is a well-known loop transformation, which is basically used to expose coarse-grain parallelism and to exploit data reuse at the cache level. However, it can also be used to exploit data reuse at the register level and to improve programs's ILP. Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iteration space is rectangular. Non-rectangular iteration spaces are commonly found in linear algebra algorithms or can arise as a result of applying previous transformations such as loop skewing. In this paper we evaluate the technique we present in [11] which is able to perform tiling for the register level in more than one dimension in both rectangular and non-rectangular iteration spaces. We use typical linear algebra algorithms having non-rectangular iteration spaces as benchmarks and compare our proposal against commercial preprocessors able to perform optimizing code transformations such as inner unrolling, outer unrolling and software pipelining. We will also present quantitative data showing the benefits of tiling only for the register level, tiling only for the cache level and tiling for both levels simultaneously. Results measured on a ALPHA 21164 processor show that tiling for both cache and register levels improves upon commercial compilers and preprocessors by factors in the range of 1.3 to 6.3.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Efficacy of an SFL-Oriented Register Instruction in Improving Iranian EFL Learners’ Writing Performance and Perception: Language Proficiency in Focus

The current study sought to explore the impact of SFL-oriented register instruction on Iranian EFL learner’ writing performance with a central focus on their English proficiency level. As its secondary aim, the study delved deeply into the learners’ perception of the register-based instruction. To these ends, 50 intermediate and 50 advanced Iranian EFL learners were selected randomly and assign...

متن کامل

Hierarchical tiling for improved superscalar performance

It takes more than a good algorithm to achieve high performance: inner-loop performance and data locality are also important. Tiling is a well-known method for parallelization and for improving data locality. However, tiling has the potential of being even more beneecial. At the nest granularity, it can be used to guide register allocation and instruction scheduling; at the coarsest level, it c...

متن کامل

Optimized Dense Matrix Multiplication on a Many-Core Architecture

Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), belong to a new set of manycore-on-a-chip systems with a software managed memory hierarchy. New programming and compiling methodologies are required to fully exploit the potential of this new class of architectures. In this pape...

متن کامل

PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests

Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tu...

متن کامل

The Deleterious Nature of Interacting Tiling Optimizations

A compiler may perform multiple optimizations, each with its own goal and cost function. While it is acknowledged that optimizations can interact, in practice the interactions are often ignored, and assumed to have no deleterious eeects. In this paper, we demonstrate for optimizations involving tiling that the interactions have unexpectedly harmful eeects on overall performance. Current trends ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Performance Evaluation of Tiling for the Register Level

نویسندگان

چکیده

منابع مشابه

The Efficacy of an SFL-Oriented Register Instruction in Improving Iranian EFL Learners’ Writing Performance and Perception: Language Proficiency in Focus

Hierarchical tiling for improved superscalar performance

Optimized Dense Matrix Multiplication on a Many-Core Architecture

PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests

The Deleterious Nature of Interacting Tiling Optimizations

عنوان ژورنال:

اشتراک گذاری